Let’s Play<tt>Mono</tt>-<tt>Poly</tt>: BERT Can Reveal Words’ Polysemy Level and Partitionability into Senses

نویسندگان

چکیده

Pre-trained language models (LMs) encode rich information about linguistic structure but their knowledge lexical polysemy remains unclear. We propose a novel experimental setup for analyzing this in LMs specifically trained different languages (English, French, Spanish, and Greek) multilingual BERT. perform our analysis on datasets carefully designed to reflect sense distributions, control parameters that are highly correlated with such as frequency grammatical category. demonstrate BERT-derived representations words’ level partitionability into senses. Polysemy-related is more clearly present English BERT embeddings, other also manage establish relevant distinctions between words at levels. Our results contribute better understanding of the encoded contextualized open up new avenues semantics research.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Word embeddings are ubiquitous in NLP and information retrieval, but it’s unclear what they represent when the word is polysemous, i.e., has multiple senses. Here it is shown that multiple word senses reside in linear superposition within the word embedding and can be recovered by simple sparse coding. The success of the method —which applies to several embedding methods including word2vec— is ...

متن کامل

Do words reveal concepts?

To study concepts, cognitive scientists must first identify some. The prevailing assumption is that they are revealed by words such as triangle, table, and robin. But languages vary dramatically in how they carve up the world by name. Either ordinary concepts must be heavily language-dependent or names cannot be a direct route to concepts. We asked English, Dutch, Spanish, and Japanese speakers...

متن کامل

Basic Units of Lexicons and Ontologies: Words, Senses and Concepts

Dictionaries and ontologies are very important resources not only for linguistic research and applications but also for other areas dealing with knowledge. In general, however, they fall short of our expectations. One reason for this under-expectation is that their basic units are not well-established. Dictionary head words have to be words rather than affixes or phrases. The meaning of a (head...

متن کامل

The polysemy of the words that children learn over time

Here we study polysemy as a potential learning bias in vocabulary learning in children. We employ a massive set of transcriptions of conversations between children and adults in English, to analyze the evolution of mean polysemy in the words produced by children whose ages range between 10 and 60 months. Our results show that mean polysemy in children increases over time in two phases, i.e. a f...

متن کامل

The Partitionability Conjecture

In 1979, Richard Stanley made the following conjecture: Every Cohen–Macaulay simplicial complex is partitionable. Motivated by questions in the theory of face numbers of simplicial complexes, the Partitionability Conjecture sought to connect a purely combinatorial condition (partitionability) with an algebraic condition (Cohen–Macaulayness). The algebraic combinatorics community widely believed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2021

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00400